Exploiting Contextual Information for Speech/Non-Speech Detection

نویسندگان

  • Sree Hari Krishnan Parthasarathi
  • Petr Motlícek
  • Hynek Hermansky
چکیده

In this paper, we investigate the effect of temporal context for speech/non-speech detection (SND). It is shown that even a simple feature such as full-band energy, when employed with a large-enough context, shows promise for further investigation. Experimental evaluations on the test data set, with a state-of-the-art multi-layer perceptron based SND system and a simple energy threshold based SND method, using the F-measure, show an absolute performance gain of 4.4% and 5.4% respectively. The optimal contextual length was found to be 1000 ms. Further numerical optimizations yield an improvement (3.37% absolute), resulting in an absolute gain of 7.77% and 8.77% over the MLP based and energy based methods respectively. ROC based performance evaluation also reveals promising performance for the proposed method, particularly in low SNR conditions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

A robust frontend for VAD: exploiting contextual, discriminative and spectral cues of human voice

Reliable automatic detection of speech/non-speech activity in degraded, noisy audio signals is a fundamental and challenging task in robust signal processing. As various speech technology applications rely on the accuracy of a Voice Activity Detection (VAD) system for their effectiveness and robustness, the problem has gained considerable research interest over the years. It has been shown that...

متن کامل

Exploiting contextual information for prosodic event detection using auto-context

Prosody and prosodic boundaries carry significant information regarding linguistics and paralinguistics and are important aspects of speech. In the field of prosodic event detection, many local acoustic features have been investigated; however, contextual information has not yet been thoroughly exploited. The most difficult aspect of this lies in learning the long-distance contextual dependenci...

متن کامل

The Impact of Contextual Variables on Internal Intensification of Apology Speech Acts in Persian: Social Distance and Severity of Offense in Focus

The current paper primarily provides an account of how apology speech acts are internally intensified in Persian. Moreover, the study checks to what extent contextual variables, namely social distance and severity of offense, may motivate the internal intensification of apology speech acts. To these ends, the study collected the required speech acts through a Discourse Completion Test (DCT) fro...

متن کامل

SVM-based speech endpoint detection using contextual speech features

Shown is an effective speech endpoint detection algorithm using a trained support vector machine (SVM) and a feature vector including contextual information speech features. With this and other innovations the proposed algorithm yields high discrimination and reports significant improvements over standard methods and algorithms defining the decision rule in terms of averaged subband speech feat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008